Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing

159

݊× 3

Transform

݊× 3

݊× 64

Transform

݊× 64

݊× 1024

MaxPooling

1× 1024

output scores

Real-valued FC layer

Bi-FC layer

[+1, +1, ڮ , െ1]

[1.23,0.12, ڮ , െ0.66]

ݏ݅݃݊(ڄ)

0.14

െ1.02

െ0.54

1.75

െ1

܉௜ିଵ

܊^܉^೔షభ

ܟ௜

܊^ܟ^೔

[9,7, ڮ , െ5]

[0.12,0.41, ڮ , 0.32]

ߙ௜

܉௜

[1.08,2.87, ڮ , െ1.60]

EM+STE

STE

ݏ݅݃݊(ڄ)

FIGURE 6.4

Outline of the 1-bit PointNet obtained by our POEM on the classiﬁcation task. We save

the ﬁrst and last fully connected layer as real valued, which is with horizontal stripes. We

give the detailed forward and back propagation process of POEM, where EM denotes the

Expectation-Maximization algorithm, and STE denotes Straight-Through-Estimator.

where we set a1 = −1 and a2 = +1. Then PR→B(·) is equivalent to the sign function, i.e.,

sign(·).

However, The binarization procedure achieved by PR→B(x) is sensitive to disturbance

when x follows a Gaussian distribution, e.g., XNOR-Net. That is, the binarization results

are subjected to the noise of the raw point cloud data, as shown in Fig. 6.3. To address this

issue, we ﬁrst deﬁne an objective as

arg min

PR→B(x) −PR→B(x + γ),

(6.37)

where γ denotes a disturbance.

Another objective is deﬁned to minimize the geometry distance between x and PR→B(x)

arg min

x,α

∥x −αPR→B(x)∥²

2^,

(6.38)

where α is an auxiliary scale factor. In recent works of binarized neural networks (BNNs)

[199, 159], they explicitly solve the objective as

α =

∥x∥1

size(x)^,

(6.39)

where size(x) denotes the number of elements in x. However, this objective neglects that α

also inﬂuences the output of the 1-bit layer. In contrast, we also consider this shortcoming

and modify this learning object for our POEM.

6.3.2

Binarization Framework of POEM

We brieﬂy introduce the framework based on our POEM, as shown in Fig. 6.4. We extend

the binarization process from 2D convolution (XNOR-Net) to fully connected layers (FCs)

for feature extraction, termed 1-bit fully connected (Bi-FC) layers, based on extremely

eﬃcient bit-wise operations (XNOR and Bit-count) via the lightweight binary weight and

activation.